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Abstract 

For finite geometry low-density parity-check codes, heavy row and column weights in their parity 
check matrix make the decoding with even Min-Sum (MS) variants computationally expensive. To 
alleviate it, we present a class of hybrid schemes by concatenating a parallel bit flipping (BF) variant 
with an Min-Sum (MS) variant. In most SNR region of interest, without compromising performance 
or convergence rate, simulation results show that the proposed hybrid schemes can save substantial 
computational complexity with respect to MS variant decoding alone. Specifically, the BF variant, 
with much less computational complexity, bears most decoding load before resorting to MS variant. 
Computational and hardware complexity is also elaborated to justify the feasibility of the hybrid schemes. 



1 

Hybrid Decoding of Finite Geometry LDPC 

Codes 

I. Introduction 

Low-density parity-check (LDPC) codes, given a sufficiently long block length, can approach 
Shannon limit with belief propagation (BP) decoding [1][2]. Hence, it remains a research focus 
among others in the coding field. Lately, a class of finite geometry (FG) LDPC codes have 
attracted great interest, by virtue of the fact that they are encodable in linear time with feedback 
shift registers [3] [4]. However, compared to other classical LDPC codes, it require much more 
complexity to decode with standard BP algorithms for FG-LDPC codes, due to heavy row and 
column weights in their parity check matrix. 

There exist many low complexity decoding schemes applicable for FG-LDPC codes. The hard 
decodings [5] [6] have the least complexity but suffer severe performance loss. To alleviate the 
degradation, at the cost of moderate complexity, a class of bit flipping (BF) variants improve 
performance after taking into account the soft information of received sequences. In [7], a BF 
function was devised wherein both the most and the least reliable bits involved in one check sum 
are considered. Further improvement was reported [8] by weighting each term in the BF function. 
A bootstrapping step [9] [10] was proposed to update those unreliable bits prior to calculating their 
BF function values. Based on [3], the methods presented in [1 1][12] achieved better performance, 
as a result of considering the impact of its received soft information on the BF function value 
of a specific bit. However, one common drawback of above BF variants is that only one bit is 
flipped per iteration, which is adverse to fast convergence requirement. To lower the decoding 
latency caused by such serial BF strategy, [13], [14] and [15] presented three decoding methods 
in the form of multi-bit flipping per iteration. In [13], when the flipping signal counter for each 
bit has reached a predesigned threshold, the pertinent bits flip immediately; in [14], the number 
of bits chosen to be flipped approximates the quotient of the number of unsatisfied check sums 
and the column weight of parity check matrix. In [15], it was suggested to flip those bits with 
positive flipping function values per iteration. Further decoding gain is obtained by adding into 
these multi-bit flipping algorithms a delay-handling procedure [16], which delays flipping those 
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bits whose soft information presents higher magnitude among others. With respect to the serial 
flipping, these parallel or multi-bit flipping methods show a significant convergence advantage 
at no cost of performance loss. 

On the other hand, substantial complexity is saved by estimating complex tanh function in 
standard BP with simple min function, which leads to Min-Sum (MS) or BP-based algorithm 
[17][18]. Then MS variants such as normalized Min-Sum (NMS) and offset Min-Sum (OMS) 
[19] proves effective to fill most performance gap between MS and standard BP, at the cost of 
minor complexity increase. 

Despite this, the heavy row and column weights of FG-LDPC codes may annoy the MS variants 
from perspective of complexity; while the BF variants present much less complexity but suffering 
some performance loss. To expect good performance and low complexity simultaneously, one 
natural way is to concatenate some component decoders to fulfill one decoding. This strategy 
was attempted in [15], wherein standard BP is called only when a multi-bit BF scheme failed. 
However, due to the serious performance mismatch between standard BP and the multi-bit BF 
scheme proposed in [15], such concatenation results in frequent invocations of standard BP in 
most of waterfall SNR region, which subsequently weakens the efforts of reducing complexity. 
Different from it, a gear shift decoding was presented in [20], it selects appropriate decoder 
among available ones at each iteration, according to the optimal trellis route obtained after 
extrinsic information transfer (EXIT) chart analysis. Theoretically, the gear shift decoding reaches 
the targets of reducing decoding latency while keeping performance. But several obstacles hinder 
its application for finite FG-LDPC codes. For one thing, the delicate optimal decoding route 
derived from EXIT chart analysis may deviate seriously from the real situation, since EXIT 
chart analysis is accurate largely for codes of large girth but FG-LDPC codes are known for the 
existence of abundant short loops. For another, the EXIT chart of BF variants remains unknown, 
but excluding such a class of decoding schemes may lead to an absence of a competitive decoder 
option for gear shift decoding. 

In the paper, we adopt a similar framework to that of [15] where two two component 
decoders form an hybrid scheme. The former component decoder may be substituted with a 
newly proposed BF variant; the latter is an MS variant instead of standard BP, considering near 
BP performance is achieved with such an MS variant. At modest and high SNR regions, both 
decoding performance of the latter decoder and low computational complexity close to the former 
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decoder are achieved, which are verified via simulations and complexity analysis. 

The remainder of the paper is organized as follows. Section 11 discusses the motivation of 
designing such a class of hybrid schemes. Section in describes its implementation using BF 
and MS variants. Simulation results, convergence rate and complexity analysis are presented in 
Section IV . Finally Section V concludes the work. 

II. Motivation of hybrid decoding 

With the goals of high performance and low complexity, a satisfying concatenation of two 
component decoders meets four conditions. First, the two decoders present distinct characteristics. 
Specifically, the former requires much less complexity than the latter. Secondly, the performance 
gap between them is within some limit to ensure performance match. In other words, while no 
gap wipes off the need of employing hybrid schemes, excessively large gap, manifested by no 
well overlapped waterfall regions for both decoders, results in frequent invocations of the second 
component decoder. Thirdly, in order not to worsen the whole decoding latency, it is beneficial 
that the convergence rates of two decoders are comparable by and large. Lastly, the hardware 
complexity of both decoders is shared to the greatest extent to lower implementation cost. 

In [15], a multi-bit flipping scheme and standard BP are jointed to serve the purpose of 
decoding. However, the multi-bit flipping suffers serious performance loss when compared to 
standard BP. Such a concatenation violates the mentioned condition two and is less meaningful, 
since standard BP still takes a substantial load in most SNR region. Compared to the serial ones, 
the multi-bit BF variants requires much less decoding iterations [16]. According to condition 
three, multi-bit BF variant is thus preferable over serial one when selecting the first component 
decoder. Moreover, the multi-bit BF variant with the least complexity and the closest performance 
to its successor has the highest priority. On the other hand, for FG-LDPC codes, MS variants 
with proper correcting factors, present almost the same performance as standard BP, thus good 
candidates of the second component decoder. 

III. Implementation of hybrid decoders 

Assume a binary {N, K) LDPC code with block length N and dimension K. Its parity check 
matrix is of the form Hmxat, where M is the number of check sums. For high rate FG-LDPC 
codes, the relation M — N indicates there exist many redundant check sums in H. The BPSK 
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modulation maps a codeword c = [ci, C2, . . . , c^v] to a symbol sequence x = [xi, X2, ■ ■ ■ ,xn] with 
Xi = 1 — 2cj, where i = 1,2, . . . , N. After the symbols are transmitted through an additive white 
Gaussian noise (AWGN) memoryless channel, we obtain at the receiver a corrupted sequence 
y = [yi,y2, ■ ■ ■ ^Vn] , where yi = Xi + Zi, Zi is an independent Gaussian random variable with 
zero mean and variance cr^. 

For convenience, the vectors below are treated as column or row vectors depending on the 
context. To differentiate each BF variant, the initials of the first two authors' surname hyphened 
by the letters "WBF" make up a unique name, unless stated otherwise. 

A. BF variants 

In LP- WBF [7], the BF function of variable node i at the /-th iteration is defined as 

fceA4(i) 

.(0 _ J \yi\ - K"^S-6^(fc) \yj\) if 4'^ = 0' 

Ji,k - \ . . , , , , , m . 



\yi\ 



i(minje^(fc) \yj\) - maXj-g^(fc) |%-| if sf = 1. 



where denotes the neighboring check nodes of variable node i, J\f{k) is the neighboring 

variable nodes of check node k, s^'^ is the A;-th component of syndrome s at the /-th iteration. 

With the intuition that the more reliable bits involved in a check sum, the more reliable the 
check will be, SZ-WBF [8] defines a BF function by weighting each term of the summation ©. 
That is, 

keMii] 

Wi^k = max(0,ai - \yj\ < A, j G ^fik)\i}\\) (4) 

where M{k)\i denotes the neighboring variable nodes of check node k except variable node i, 
ai is an integer constant, /3i is a real constant, || ■ || is to obtain the set cardinality. 

For serial BF variants such as SZ-WBF, only one bit with the smallest is flipped at the /-th 
iteration. Hence, the maximum number of iterations needs to be predesigned high enough to 
allow decoding convergence. 
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Due to a positive correlation between the number of erroneous bits and that of unsatisfied 
check sums, NT-WBF [14] suggests flipping A*^'^ bits of the smallest fP defined by ([T]) at the 
/-th iteration, 

where Wh{-) denotes the calculation of Hamming weight, d^ is the column weight of matrix H, 
\_x\ is the integral part of x. 

At each iteration, LZ-WBF [15] flips all the bits with flipping function values greater than 
zero, among which, the flipping function is defined as [11] 

f? = E (24'^ - l)(.min ^ e i^.N] (5) 

where P2 is a real weighting factor. 

WZ-WBF [13] uses the same BP function as in [12], namely, 

fP = J2 (2^^ - min -/?3|2/.|, t e [l,iV] (6) 

where Ps is a real weighting factor. Then at each iteration, for each unsatisfied check sum, a 
flipping signal is assigned to some involved bit. And only those bits are flipped which have 
accumulated flipping signals more than a threshold. 

To prevent some reliable bits from flipping hastily, improved parallel weighted bit flipping 
(IPWBF) [16] added a delay-handling step into the steps of WZ-WBF. 

Compared with IPWBF, the proposed LF-WBF varies by utilizing the BF function of SZ- 
WBF, while keeping other steps largely unchanged. To be self-contained, LF-WBF is described 
as follows: 

1) Preprocess: Assume a threshold T be the value of the L/34A^J-th smallest element among 
array \yi\,i E [1,^], where is a real constant within [0,1], then those bits with \yi\ 
greater than T are marked reliable, otherwise unreliable. 

2) Initialize: / ^ 0; calculate initial values of f-'^K i G [1, A^] according to Q, ([3]). For the 
bits E {i\\yi\ > T,i E [1, A^]}, the delay-handling counters ^ 0; note hard-decision of 
y as c^^\ 

3) Syndrome and reset: Calculate s^^'^ = Hc^'^ If s*^'^ = 0, stop to return c^^'^ as the decoding 
result. If not, bi ^ 0,i E [1,A^], h is a flipping counter which sums the flipping signals 
for bit i. 
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4) Collect flipping signals: Update / > \ i G [1, iV] based on ©, ©. For each k G 7^ 
0,A; G [1,M]}, identify the index i* = argminje_^^(fc) /^'^ then 6j. ^ 6j. + 1, that is, a 
flipping signal is collected for bit i*. 

5) Decide flipping bits: It is divided into two substeps. 

a) For the bits G {i\bi > a2,i G [1, A^]}, where 0^2, as a positive integer, represents the 
flipping threshold, flip them if only the resulting syndrome 8*^'+^) = 0. Otherwise 
turn to the next sub step. 

b) Delay-handling: For the unreliable bits G {i\bi > 0^2, \yi\ <T,iG [1, A^]}, put them 
in a to-be-flipped list; for the reliable bits G {i\bi > a2, \yi\ > T,i G [1, A^]}, update 
by <— a, + 1. Subsequently, put the bits G {i\ai > a3,i E [^,N]} in the to-be- 
flipped list, where 0:3 is a small positive integer defining a delay-handling threshold. 
Obviously, it is meaningful only for > 2. Relax 03 ^ 03 — 1 if only the to-be- 
flipped list is empty, then flip the bits G {i\bi = 03, i G [1, A^]}. Declare failure if no 
bit is qualified yet. 

Since delay-handling step may potentially increase the average number of decoding itera- 
tions, substep one reduces its impact effectively. 

6) Flip and reset: Flip these bits in the to-be-flipped list. Reset all the bits E {i\ai > a3,i E 
[1, A^]} by flj ^ 0. Noticeably, before the next resetting occurs, the duration of may 
last several iterations while that of bi is always one iteration. 

7) / ^ / + 1. If / < Im, goto step 3 to continue one more iteration; otherwise, declare failure. 

B. MS variants 

At the check nodes end, compared with standard BP implemented in Log-likelihood ratio 
(LLR) domain, NMS and OMS [19], approximate ^ with ^ and dH), respectively, thus saving 
most complexity. 




a-i) 




n 



(7) 
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where I denotes the message sent from check node j to variable node i at the l-th iteration; 
^fk^^ denotes the message sent from variable node k to check node j at the (/ — l)-th iteration; 
/Ss or Pe, being a real constant, functions as a scaling or offset factor, respectively. 

To further reduce complexity, at the variable node end, the calculating of (flOl) is approximated 
with ([TT]) in the normalized APP-based (NAB) algorithm [19]. 

keM{i)\j 

zf}=F^+ Yl L%^J^M{i) (11) 
feeA4{j) 

Where Fi is the initial LLR of bit i. For the difference-set cyclic (DSC) codes, it was reported 
NAB yields almost as good performance as NMS [19]. As shown in the simulation later, similar 
observation also holds for FG-LDPC codes. 



C. Block graph of a hybrid decoding scheme 



Input 



LF-WBF 



MS variant 




Successful decoding 



Successful decoding 



Fig. 1 Block graph of hybrid decoding scheme 

There are many BF variants and MS variants, thus the combinations of BF variant plus 
MS variant is abundant. For instance of 'LF-WBF+NMS', as shown in Fig. 1[ two component 
decoders are independent comparatively. The latter takes over decoding so long as the former 
failed. 



D. Optimize parameters by differential evolution 

It is hard to optimize the group of parameters involved in LF-WBF theoretically. Hence, 
differential evolution (DE), known as a heuristic search method, is exploited to approximate 
the optimality. Similar to the genetic algorithm, DE is a simple and reliable optimization tool 
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[21]. In DE, via various operations including mutation, combination and selection, a population 
of solution vectors are updated generation by generation, with those new vectors with small 
objective values survived, until the population converges to the global optimum. 

To aid LF-WBF to optimize its parameter vector (ai, 0^2, as, P^), the objective function of 
DE is designated to find the minimum bit error rate (BER) given a block of received sequences. 
In order to save computation, each parameter of LF-WBF is roughly assigned an evaluation 
interval beforehand. For instance, ai, 02 are integers in [1, d^/2], a-^ is a small positive in [1, 4], 
Pi, (34 are real numbers in [0, 1]. 

For (273,191) and (1023,781) FG-LDPC codes [3], DE results are given in ITable-II with 
varied channel variance a^. 



IV. Simulation Results and Discussion 

A. Parameters selection 

It is verified that decoding performance of LF-WBF is largely insensitive to the minor change 
of its parameters, thus in all SNR region, we assume the settings as shown on the first row of 
ITable-II[ after referring to ITable-I[ The additional advantage of such simplification is that the 
overall hybrid decoding requires no more a priori information about the channel, namely, holding 
as well the property of uniformly most powerful (UMP)[18] for MS variants. 

For LZ-WBF and WZ-WBF, the data presented in ITable-III come from the existing literature, 
as mentioned in the last column of ITable-Ill 

After applying DE for MS variants, we select the settings as the last three rows of ITable-IJ 
for NAB, NMS and OMS. Noticeably, the optimization results of the scaling factor for NAB 
and NMS are different. 

Table-I : Parameters optimization of LF-WBF for (273,191) (left) and 
(1023,781) (right) FG-LDPC codes using differential evolution 
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Table-II : Parameters settings of various decoding 
schemes for (273,191) and (1023,781) FG-LDPC codes 



Scheme 


Parameter(s)=those for (273,191); those for (1023,781) 


Source 


LF-WBF 


(ai, aa, «3, /34) = (6, 4, 2, 0.45, 0.07); (8, 7, 2, 0.4, 0.04) 


DE 


LZ-WBF 


(32=1.5; 2.1 


[15] 


WZ-WBF 


(a2,A) = (4, 1.3); (10, 1.8) 


[22] 


SZ-WBF 


(ai,/3i)=N/A; (9,0.5) 


[8] 


NAB 


/35=5.7; 7.1 


DE 


NMS 


^5=2.9; 3.7 


DE 


OMS 


Pg=0.22; 0.20 


DE 



B. Decoding performance 



The frame error rate (FER) curves of some BF variants and IVIS variants are plotted in Fig. 2 
for (273,191) code. 
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Fig. 2 FER curves for (273,191) FG-LDPC code under various BF or MS variants 

In the legend, the number in the brackets stands for the maximum number of iteration /„. It 
is found that BF variants are in general inferior to IMS variants from perspective of performance. 
Specifically, at the point FER=10 ^ LF-WBF leads WZ-WBF, NT-WBF and LZ-WBF about 
0.25, 0.58 and 0.6 dB, respectively. But it lags behind NAB, OMS and NMS about 0.2, 0.26, 
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0.32 dB, respectively. Further comparison between LF-WBF and IPWBF [16] shows that they 
present the similar decoding performance, thus exchangeable each other. Therefore, LF-WBF 
owns the most matched SNR region as that of MS variants among the available BF variants. 
Considering LDPC codes commonly have large enough minimum distance, the cases seldom 
occur where BF variant results in an undetectable error but MS variant decodes correctly. On 
the other hand, there exist a few cases where BF variant works but MS variant fails. Thus in 
the form of a BF variant plus an MS variant, the hybrid decoding will keep at least the same 
performance as the MS variant alone. However, for each combination, the matching degree 
between two component decoders impacts heavily the overall decoding complexity. 



For (1023,781) code, the FER curves are plotted in Fig. 3 It is observed that when the block 
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NMS(20) 

LP-WBF{200) 
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3.2 3.4 3.6 3.8 



E,/N„(dB) 



4.2 4.4 



Fig. 3 FER performance for (1023,781) FG-LDPC code under BF or MS variants 

length increases from 273 to 1023, the curves relation within BF variants and MS variants still 
holds, except that the closeness among these curves slightly shifts. For instance, at the point 
FER=10 there exists about 0.3 dB between LF-WBF and NMS, while LF-WBF exceeds LZ- 
WBF more than 0.3 dB. Also included are the curves of two serial approaches: LF-WBF and 
SZ-WBF. The performance of LP-WBF and SZ-WBF with = 200 approximates NT-WBF 
and LF-WBF with = 20, respectively. Meanwhile, the full loop detection [8], which proves 
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effective in avoiding decoding trappings for serial BF variants, is utilized for both LP-WBF and 
SZ-WBF. 

C. Convergence rate 

Since some applications require to be small, it is thus meaningful to investigate the 
convergence rate of various decoding schemes. At a typical point SNR=3.42dB (or cr=0.57) 
of (273,191) code. iTible-IIII gives performance comparison among each schemes under varied 

It is seen that although = 3 is too rigorous for all decoding schemes, each BF variant 
reaches its individual decoding capability at the specified point within = 20. That is, more 
iterations after the 20-th iteration achieves no further decoding improvement; while MS variants 
require to be at least 50 to fully decode the received sequences. Also included in ITable-IlH is 
the data of BP. Interestingly, at Im = 3, BP yields the best decoding performance among others. 
But its convergence rate is not satisfying. It is shown that Im = 50 is not even sufficient for 
BP decoding, because the performance improves from FER=1.6e-3 to 7.2e-4 when Im increases 
to 200. For this reason, given a small Im = 20, LF-WBF even excels BP a little, as shown in 
ITable-IIll Further simulation shows that LF-WBF with Im = 20 performs better than BP in the 
region where SNR is greater than 3.42 dB. Another noticeable point shown in lTable-illl is that the 
performance of BP is generally inferior to MS variants, despite its high complexity. Therefore, 
BP is less attractive to be selected as the second component decoder of a hybrid scheme. Taking 
into account the fact that serial BF variants require much more than above multi-bit BF 

Table-Ill : Decoding performance of various schemes under varied 
Im for (273,191) FG-LDPC code at SNR=3.42dB 



Scheme 


-^771 3 


I,n = 10 


Im = 20 


/m = 50 


/m = 200 


LZ-WBF 


9.2e-2 


3.6e-2 


3.6e-2 


3.6e-2 


3.6e-2 


NT-WBF 


5.9e-l 


4.4e-2 


3.8e-2 


3.8e-2 


3.8e-2 


WZ-WBF 


2.5e-2 


9.6e-3 


9.8e-3 


9.8e-3 


9.8e-3 


LF-WBF 


4.5e-2 


4.2e-3 


2.8e-3 


2.4e-3 


2.3e-3 


NAB 


l.le-1 


2.1e-3 


7.6e-4 


4.4e-4 


4.4e-4 


OMS 


1.6e-2 


9.6e-4 


6.6e-4 


5.0e-4 


4.6e-4 


NMS 


l.le-2 


5.2e-4 


3.8e-4 


3.6e-4 


3.4e-4 


BP 


9.4e-3 


3.9e-3 


2.9e-3 


1.6e-3 


7.2e-4 



12 



variants [14] [16], and LF-WBF performs the best among existing multi-bit BF variants, LF- 
WBF plus some MS variant intuitively presents a competitive form of hybrid decoding scheme. 
The similar points are supported as well after generalized to other longer FG-LDPC codes. 



Let A„i denote average number of iterations for each decoding scheme, as seen in Fig. 4 for 

(1023,781) code, Ani of NT-WBF sticks out prominently while that of LZ-WBF varies slowly 

with E^/Nq, both are due to the algorithms themselves. In most SNR region of interest, all BF 

variants except NT-WBF present comparable Ani as MS variants, thus meeting well the condition 

three discussed in Section II . 
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Fig. 4 Average number of iterations Ani of various decodings schemes for (1023,781) 
FG-LDPC code 



D. Computational complexity analysis 

Practically, any BF variant followed by an MS variant will yield the same decoding per- 
formance as the latter alone. For instance, LF-WBF plus MS variant performs almost equally 
as LZ-WBF plus MS variant, regardless of the fact that LF-WBF is far superior to LZ-WBF. 
However, computational complexity differs enormously with respect to each hybrid scheme. 
Generally, it is hard to accurately describe the required complexity for each decoding scheme, 
so data obtained in the simulations is presented to support our viewpoints if necessary. 
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Let (ij,, dc individually denote the column and row weights of parity check matrix H, then 
for each BF variant, its complexity roughly consists of three parts: preprocessing, updating BF 
function and selecting flipping bits. To the best of our knowledge, the complexity of preprocessing 
and initialization is largely omitted in existing literature. However, the following analysis and 
simulation will show it contributes substantially to the complexity at very high SNR region. 
Ignoring simple binary operations and a small number of real multiplications involved some- 
times, it suffice to address the dominant real additions for each BF variant, assuming one real 
comparison is treated as one real addition. 

At the stage of preprocessing, for LP-WBF and NT-WBF, about N(2dc — 3) comparisons are 
needed in computing min and max terms of Similarly, for LZ-WBF and WZ-WBF, about 
N{dc — l) comparisons is required individually in computing the min term of ^ and Besides 
that for LP-WBF, both SZ-WBF and LF-WBF require extra comparisons to obtain Wi^k term 
of dH). With respect to SZ-WBF, LF-WBF requires about N \og2\_(3iN \ + more comparisons 
to determine the bit with the [/94A^J-th smallest magnitude and to mark the delay-flipping bits. 

Prior to updating the BF function of each bit, it is initialized with d^ — l additions for each BF 
variant. For multi-bit BF variants, there are two ways of updating the BF function of pertinent 
bits since the second iteration. One is to invoke d^d^ additions per flipped bit to update its BF 
function; another is to update the BF function of each bit after comparing its column of H with 
the syndromes before and after the latest iteration. The latter is more economical, considering 
two flipping bits in the same check sum result in two extra additions for the former, but avoidable 
for the latter. For serial BF variants, totally d^dc terms are used to update the BF function of 
those affected bits per iteration. 

To decide which bits to flip, each BF variant has its own approach. For LP-WBF and SZ- 
WBF, it just requires — 1 comparisons to find the bit with the smallest BF function value; for 
WZ-WBF and LF-WBF, dc — 1 comparisons are required per unsatisfied check to collect flipping 
signals for each bit; for NT-WBF, its complexity at this stage is equal to selecting the smallest 
say 5 elements in an unordered array. Noticeably, no computation is required for LZ-WBF, since 
it simply flips those bits with positive BF function values. 

To sum up, ITable-IV] gives the complexity composition for each BF variant. In the table, 
denotes the average number of selected bits per iteration for NT-WBF, Anc is the average number 
of updated BF function terms per bit per iteration, Ans is the average number of unsatisfied checks 
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Table-IV : Approximated real additions per sequence for various 
decoding schemes of FG-LDPC codes 



Schemes 


Preprocess 


Update BF function (include initialization) 


Select bit(s) to flip 


LZ-WBF 


N{dc - 1) 


N{d^ - 1) + [Anr - l)iVv4„c 





NT-WBF 


iV(2dc - 3) 


iV(d, - 1) + - l)NAr,a 


A„iN log2 A„b 


WZ-WBF 


N{dc - 1) 


N{d^ - 1) + {A„^ - l)iV^„c 


AniAns{dc 1) 


LF-WBF 


7V(2de - 1 + log2 [PiN]) 


N{d^ - 1) + {A„^ - 1)NA^^ 


AniAns{dc 1) 


SZ-WBF 


N{2dc - 2) 


N(dy - 1) + {A„i - l)d„dc 


A„,(7V- 1) 


LP-WBF 


iV(2dc - 3) 


iV(d„ - 1) + - l)d„dc 


A„,{N-1) 


NAB 
OMS, NMS 


A^,{2Nd^ + M([log2 d,] - 2)) 
A„i{N{U^ - 3) + M(riog2 del - 2)) 



per iteration. Also included are the complexity expressions of NAB, OMS and NMS as reported 
in [7], wherein [■] is the ceiling function. 

For (1023,781) code, = 1023, 4 = 4 = 32 [3]. Assume = 20 for multi-bit BF variants, 
Im = 200 for MS variants and serial BF variants to ensure full decoding convergence, at a typical 
point of SNR=3.28dB (or a = 0.555), ITable-VI presents the figures observed in simulation, 
among which the last column is the number of real additions according to the expressions of 
ITable-IVl Noticeably, the last two rows of ITable-Vl gives complexity of two instances of hybrid 
decoding schemes as well. 

After studying ITable-V[ we find that the class of BF variants demonstrates a substantial 



Table-V : Complexity comparison per sequence for various decod- 
ing schemes of (1023,781) FG-LDPC code at SNR=3.28dB 



Scheme 








-A7IC 


number of real additions(e-l-5) 


LZ-WBF 


4.70 


N/A 


N/A 


8.11 


0.94 


NT-WBF 


9.61 


N/A 


9.73 


7.72 


1.94 


WZ-WBF 


4.48 


348.01 


N/A 


10.41 


1.49 


LF-WBF 


4.74 


373.63 


N/A 


10.10 


1.95 


SZ-WBF 


49.08 








1.95 


LP-WBF 


68.66 








2.34 


NAB 


5.53 






N/A 


3.79 


OMS 


4.47 








5.78 


NMS 


3.77 








4.93 


LZ-WBF-^NMS 


(Data for LZ-WBF) + (NMS with A„i=lM) 


3.40 


LF-WBF-hNMS 


(Data for LF-WBF) + (NMS with A„i=0.88) 


3.10 
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advantage over MS variants in terms of complexity. Among the BF variants, LZ-WBF presents 
the least complexity due to its fast convergence, low-complexity preprocessing and no complexity 
demand of selecting bits; despite its simplicity, at low and modest SNR regions, the combination 
of LZ-WBF and NMS requires more complexity than that of LF-WBF and NMS, as a result 
that the former combination demands one more iteration of NMS on average, as shown in 
ITable-V[ Under the condition of offering equivalent performance, the last three rows of ITable-Vl 
illustrates that both hybrid decoding schemes can save much complexity, with respect to its 
second component decoder alone. 



To better illustrate complexity comparison in the whole SNR region. Fig. 5 present complex- 

1 I < ' 4t V ^1 V ^ V < 1 



0.9 
0.8 

g 

I 0.7 
'x 

Q. 

E 0.6 

o 

O 



0.5 
0.4 




0.3 
2.6 



(273,191): NMS(200) 
(273,191): Hybrid one 
(273,191): Hybrid two 
(1023,781): NMS(200) 
(1023,781): Hybrid one 
(1023,781): Hybrid two 



2.8 



3.2 



3.4 



3.6 



E,/N„(dB) 



Fig. 5 Complexity ratio curves of Hybrid one: LZ-WBF(20)-i-NMS(200) 
and Hybrid two: LF-WBF(20)-i-NMS(200) for (273,191) and (1023,781) FG- 
LDPC codes 

ity ratio curves for (273,191) and (1023,781) codes. Assuming the complexity of NMS is a 
benchmark, then complexity ratio is defined as the ratio of the complexity of a specified hybrid 
scheme and that of NMS. For NMS, since another AniNd^ divisions is actually required [7], we 
roughly treat as total complexity the sum of this expression and the related formula in ITable-IVl 
At very low SNR region, any of the hybrid schemes shows no much advantage, due to the fact 
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that most decodings are up to NMS. However, with increased SNR, both hybrid schemes yield 
more and more complexity reduction, resulting from more involvements of LZ-WBF or LF- 
WBF in decoding. For short (273,191) code, the combination of 'LZ-WBF+NMS' exceeds that 
of 'LF-WBF+NMS' at the point SNR=3.05dB. While the occurrence extends to SNR=3.45dB 
for (1023,781) code. Hence, it suggests that the intersection of these two schemes will move to 
a higher SNR with longer block length. 

Let CnmsiClzn,Clfn denote the complexity of above three decoding schemes. For suf- 
ficiently long FG-LDPC codes, to seek the asymptotic performance ratios in very high SNR 
region, the following approximations are derived based on ITable-IVl 

{ Clzn _ dc+dv-2+(A„i-l)A„c ^ 2 

Cnms A„i(5dt, + |"log2 rfcl-5) 5A„i ' 

Clzn _ '2dc+d^-2+\og2l(}iN\)+(A„i-l)Anc+A„i{dc-l)Ans/N ^ 9+A^i 

Cnms A„i(5di, + [log2 rfc]-5) ~ 15 A„i ' 

wherein the following simulation results are exploited: dy = dc, both are large numbers 
compared with other terms; Ani of various schemes ranges in [1, 2] and tends to be near each 
other; Anc of LZ-WBF or LF-WBF is small compared to d^; Ans/N is about one third. Similar 
approach can be used to derive complexity ratios of other hybrid combinations. 

E. Hardware complexity 

Seemingly, the proposed hybrid schemes add much more hardware complexity with respect 
to its second component decoder alone. However, most hardware complexity can be shared 
instead between two component decoders. For instance of 'LF-WBF-i-NMS', assuming NMS 
hardware is available, then min, max operations at the preprocessing phase of LF-WBF, and 
collecting flipping signals at the selecting flipping bits phase of LF-WBF, can be accomplished 
via the check node logics of NMS, while the initialization step via the bit node logics of NMS. 
Thus compared with NMS, 'LF-WBF-i-NMS' only includes a few more integer counters and 
interconnection logics. Therefore, the extra hardware complexity of hybrid decoding schemes is 
largely ignorable. 

V. Conclusions 

For finite FG-LDPC codes, the concatenation of BF variant and MS variant proves its effec- 
tiveness in decoding at a wide rang of SNR region, by means of achieving performance of the MS 
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variant with substantial reduced computational complexity. While LZ-WBF plus MS variant has 
its advantage at high SNR region of interest; the proposed LF-WBF plus MS variant demonstrates 
better complexity saving at the rest of SNR region, due to the well overlapped waterfall regions 
between two component decoders. Evidently, if we can gear among these hybrid schemes based 
on varied SNRs, the decoding will be more powerful and robust. 

For BP decoding, it is known that flooding schedule is not optimal. Sharon et al. [23] [24] [25] 
proved that serial message passing schedule, implemented by fully utilizing available updated 
messages, can halve the average number of iterations of flooding schedule without performance 
penalty. But it risks resulting in higher decoding latency. Contrary to it, our hybrid scheme yields 
a good tradeoff among performance, complexity and latency. 

References 

[1] D. MacKay, "Good error-correcting codes based on very sparse matiices," IEEE Transactions on Information Theory, 

vol. 45, no. 2, pp. 399^31, 1999. 
[2] S. Chung, G. Forney Jr, T. Richardson, and R. Urbanke, "On the design of low-density parity-check codes within 0.0045 

dB of the Shannon limit," IEEE Communications Letters, vol. 5, no. 2, pp. 58-60, 2001. 
[3] Y. Kou, S. Lin, and M. Fossorier, "Low-density parity-check codes based on finite geometries: arediscovery and new 

results," IEEE Transactions on Information Theory, vol. 47, no. 7, pp. 2711-2736, 2001. 
[4] H. Tang, J. Xu, S. Lin, and K. Abdel-Ghaffar, "Codes on finite geometries," IEEE Transactions on Information Theory, 

vol. 51, no. 2, pp. 572-596, 2005. 
[5] R. Gallager, "Low-density parity-check codes," IEEE Transactions on Information Theory, vol. 8, no. 1, pp. 21-28, 1962. 
[6] R. Lucas, M. Fossorier, K. Yu, and S. Lin, "Iterative decoding of one-step majority logic deductible codesbased on beUef 

propagation," IEEE Transactions on Communications, vol. 48, no. 6, pp. 931-937, 2000. 
[7] Z. Liu and D. Pados, "A decoding algorithm for finite-geometry LDPC codes," IEEE Transactions on Communications, 

vol. 53, no. 3, pp. 415^21, Mar. 2005. 
[8] M. Shan, C. Zhao, and M. Jiang, "Improved weighted bit-flipping algorithm for decoding LDPC Codes," lEE Proceedings- 
Communications, vol. 152, no. 6, pp. 919-922, 2005. 
[9] A. Nouh and A. Banihashemi, "Bootsttap decoding of low-density parity-check codes," IEEE Communications Letters, 

vol. 6, no. 9, pp. 391-393, 2002. 
[10] , "ReUability-based schedule for bit-flipping decoding of low-density Parity-check codes," IEEE Transactions on 

Communications, vol. 52, no. 12, pp. 2038-2040, 2004. 
[11] J. Zhang and M. Fossorier, "A modified weighted bit-flipping decoding of low-density parity-check codes," IEEE 

Communications Letters, vol. 8, no. 3, pp. 165-167, 2004. 
[12] M. Jiang, C. Zhao, Z. Shi, and Y. Chen, "An improvement on the modified weighted bit flipping decoding algorithm for 

LDPC codes," IEEE Communications Letters, vol. 9, no. 9, pp. 814-816, 2005. 
[13] X. Wu, C. Zhao, and X. You, "Parallel Weighted Bit-Flipping Decoding," IEEE Communications Letters, vol. 11, no. 8, 

pp. 671-673, 2007. 



18 



[14] T. Ngatched, F. Takawira, and M. Bossert, "An improved decoding algorithm for finite-geometry LDPC codes," IEEE 

Transactions on Communications, vol. 57, no. 2, pp. 302-306, 2009. 
[15] J. Li and X. Zhang, "Hybrid Iterative Decoding for Low-Density Parity-Check Codes Based on Finite Geometries," IEEE 

Communications Letters, vol. 12, no. 1, pp. 29-31, 2008. 
[16] G. Li and G. Feng, "Improved parallel weighted bit-flipping decoding algorithm for LDPC codes," lET Communications, 

vol. 3, no. 1, pp. 91-99, 2009. 

[17] N. Wiberg, Codes and decoding on general graphs. Department of Electrical Engineering, Linkoping University, 1996. 
[18] M. Fossorier, M. Mihaljevic, and H. Imai, "Reduced complexity iterative decoding of low-density parity checkcodes based 

on belief propagation," IEEE Transactions on Communications, vol. 47, no. 5, pp. 673-680, 1999. 
[19] J. Chen, A. Dholakia, E. Eleftheriou, M. Fossorier, and X. Hu, "Reduced-Complexity Decoding of LDPC Codes," IEEE 

Transactions on Communications, vol. 53, no. 8, pp. 1288-1299, 2005. 
[20] M. Ardakani and F. Kschischang, "Gear-shift decoding," IEEE Transactions on Communications, vol. 54, no. 7, pp. 1235- 

1242, 2006. 

[21] R. Stom and K. Price, "Differential evolution-a simple and efficient adaptive scheme for global optimization over continuous 

spaces," Journal of Global Optimization, vol. 11, no. 4, pp. 341-359, 1997. 
[22] W. Xiaofu, L. Cong, J. Ming, X. Enyang, Z. Chunming, and Y. Xiaohu, "Towards understanding weighted bit-flipping 

decoding," in IEEE Int. Symp. Inform. Theory, vol. 7, pp. 1561-1566. 
[23] D. Levin, E. Sharon, and S. Litsyn, "Lazy scheduling for LDPC decoding," IEEE Communications Letters, vol. 11, no. 1, 

pp. 70-72, 2007. 

[24] H. Rfir and I. Kanter, "Parallel versus sequential updating for beUef propagation decoding," Physica A: Statistical Mechanics 

and its Applications, vol. 330, no. 1-2, pp. 259-270, 2003. 
[25] E. Sharon, S. Litsyn, and J. Goldberger, "Efficient Serial Message-Passing Schedules for LDPC Decoding," IEEE 

Transactions on Information Theory, vol. 53, no. 11, pp. 4076-4091, 2007. 



